{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 模型构造\n",
    "\n",
    "让我们回顾一下在[“多层感知机的简洁实现”](../chapter_deep-learning-basics/mlp-gluon.ipynb)一节中含单隐藏层的多层感知机的实现方法。我们首先构造`Sequential`实例，然后依次添加两个全连接层。其中第一层的输出大小为256，即隐藏层单元个数是256；第二层的输出大小为10，即输出层单元个数是10。我们在上一章的其他\n",
    "节中也使用了`Sequential`类构造模型。这里我们介绍另外一种基于`tf.keras.Model`类的模型构造方法：它让模型构造更加灵活。\n",
    "\n",
    "\n",
    "## 4.1.1 build model from block\n",
    "\n",
    "`tf.keras.Model`类是`tf.keras`模块里提供的一个模型构造类，我们可以继承它来定义我们想要的模型。下面继承`tf.keras.Model`类构造本节开头提到的多层感知机。这里定义的`MLP`类重载了`tf.keras.Model`类的`__init__`函数和`call`函数。它们分别用于创建模型参数和定义前向计算。前向计算也即正向传播。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.0.0\n"
     ]
    }
   ],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "print(tf.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(tf.keras.Model):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.flatten = tf.keras.layers.Flatten()    # Flatten层将除第一维（batch_size）以外的维度展平\n",
    "        self.dense1 = tf.keras.layers.Dense(units=256, activation=tf.nn.relu)\n",
    "        self.dense2 = tf.keras.layers.Dense(units=10)\n",
    "\n",
    "    def call(self, inputs):         \n",
    "        x = self.flatten(inputs)   \n",
    "        x = self.dense1(x)    \n",
    "        output = self.dense2(x)     \n",
    "        return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上的`MLP`类中无须定义反向传播函数。系统将通过自动求梯度而自动生成反向传播所需的`backward`函数。\n",
    "\n",
    "我们可以实例化`MLP`类得到模型变量`net`。下面的代码初始化`net`并传入输入数据`X`做一次前向计算。其中，`net(X)`将调用`MLP`类定义的`call`函数来完成前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=62, shape=(2, 10), dtype=float32, numpy=\n",
       "array([[ 0.15637134,  0.14062534, -0.11187253, -0.13151687,  0.12066578,\n",
       "         0.15376692,  0.03429577,  0.07023033, -0.12030508, -0.38496107],\n",
       "       [-0.02877349,  0.1088542 , -0.20668823,  0.08241277,  0.06292161,\n",
       "         0.25310248,  0.04884301,  0.27015388, -0.13183925, -0.23431192]],\n",
       "      dtype=float32)>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = tf.random.uniform((2,20))\n",
    "net = MLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.2 Sequential\n",
    "\n",
    "我们刚刚提到，`tf.keras.Model`类是一个通用的部件。事实上，`Sequential`类继承自`tf.keras.Model`类。当模型的前向计算为简单串联各个层的计算时，可以通过更加简单的方式定义模型。这正是`Sequential`类的目的：它提供`add`函数来逐一添加串联的`Block`子类实例，而模型的前向计算就是将这些实例按添加的顺序逐一计算。\n",
    "\n",
    "我们用Sequential类来实现前面描述的MLP类，并使用随机初始化的模型做一次前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=117, shape=(2, 10), dtype=float32, numpy=\n",
       "array([[-0.42563885, -0.11981717,  0.0838763 ,  0.04553887,  0.09710997,\n",
       "         0.16843301,  0.15290505, -0.00364013, -0.13743742, -0.36868355],\n",
       "       [-0.37125233, -0.18243487,  0.24916942, -0.04006755,  0.06090571,\n",
       "         0.05331742,  0.24555533, -0.03183865, -0.10122052, -0.11752242]],\n",
       "      dtype=float32)>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(256, activation=tf.nn.relu),\n",
    "    tf.keras.layers.Dense(10),\n",
    "])\n",
    "\n",
    "model(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1.3 build complex model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "虽然`Sequential`类可以使模型构造更加简单，且不需要定义`call`函数，但直接继承`tf.keras.Model`类可以极大地拓展模型构造的灵活性。下面我们构造一个稍微复杂点的网络`FancyMLP`。在这个网络中，我们通过`constant`函数创建训练中不被迭代的参数，即常数参数。在前向计算中，除了使用创建的常数参数外，我们还使用`tensor`的函数和Python的控制流，并多次调用相同的层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class FancyMLP(tf.keras.Model):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.flatten = tf.keras.layers.Flatten()\n",
    "        self.rand_weight = tf.constant(\n",
    "            tf.random.uniform((20,20)))\n",
    "        self.dense = tf.keras.layers.Dense(units=20, activation=tf.nn.relu)\n",
    "\n",
    "    def call(self, inputs):         \n",
    "        x = self.flatten(inputs)   \n",
    "        x = tf.nn.relu(tf.matmul(x, self.rand_weight) + 1)\n",
    "        x = self.dense(x)    \n",
    "        while tf.norm(x) > 1:\n",
    "            x /= 2\n",
    "        if tf.norm(x) < 0.8:\n",
    "            x *= 10\n",
    "        return tf.reduce_sum(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个`FancyMLP`模型中，我们使用了常数权重`rand_weight`（注意它不是模型参数）、做了矩阵乘法操作（`tf.matmul`）并重复使用了相同的`Dense`层。下面我们来测试该模型的随机初始化和前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=220, shape=(), dtype=float32, numpy=24.381481>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = FancyMLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为`FancyMLP`和`Sequential`类都是`tf.keras.Model`类的子类，所以我们可以嵌套调用它们。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor: id=403, shape=(), dtype=float32, numpy=3.2303767>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class NestMLP(tf.keras.Model):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.net = tf.keras.Sequential()\n",
    "        self.net.add(tf.keras.layers.Flatten())\n",
    "        self.net.add(tf.keras.layers.Dense(64, activation=tf.nn.relu))\n",
    "        self.net.add(tf.keras.layers.Dense(32, activation=tf.nn.relu))\n",
    "        self.dense = tf.keras.layers.Dense(units=16, activation=tf.nn.relu)\n",
    "\n",
    "    \n",
    "    def call(self, inputs):         \n",
    "        return self.dense(self.net(inputs))\n",
    "\n",
    "net = tf.keras.Sequential()\n",
    "net.add(NestMLP())\n",
    "net.add(tf.keras.layers.Dense(20))\n",
    "net.add(FancyMLP())\n",
    "\n",
    "net(X)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tf2.0",
   "language": "python",
   "name": "tf2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
